Library Imports

from pyspark.sql import SparkSession
from pyspark.sql import types as T

Template

spark = (
    SparkSession.builder
    .master("local")
    .appName("Section 4 - More Comfortable with SQL?")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()
)

sc = spark.sparkContext

import os

data_path = "/data/pets.csv"
base_path = os.path.dirname(os.getcwd())
path = base_path + data_path

df = spark.read.csv(path, header=True)
df.toPandas()

	id	species_id	name	birthday	color
0	1	1	King	2014-11-22 12:30:31	brown
1	2	3	Argus	2016-11-22 10:05:10	None

Register DataFrame as a SQL Table

df.createOrReplaceTempView("pets")

What Happened?

The first step in making a df queryable with SQL is to register the table as a sql table.

This particular function will replace any previously registered local table named pets as a result. There are other functions that will register a dataframe with slightly different behavior. You can check the reference docs if this isn't the desired behavior: docs

Let Write a SQL Query!

df_2 = spark.sql("""
SELECT 
    *
FROM pets
WHERE name = 'Argus'
""")

df_2.toPandas()

	id	species_id	name	birthday	color
0	2	3	Argus	2016-11-22 10:05:10	None

What Happened?

Once your df is registered, call the spark sc function on your spark session object. It takes a sql string as an input and outputs a new df.

Conclusion?

If you're more comfortable with writing sql than python/spark code, then you can do so with a spark df! We do this by:

Register the df with df.createOrReplaceTempView('table').
Call the sql function on your spark session with a sql string as an input.
You're done!

Section 4 - More Comfortable with SQL

Library Imports

Template

Register DataFrame as a SQL Table

What Happened?

Let Write a SQL Query!

What Happened?

Conclusion?

results matching ""

No results matching ""